torch.compile fullgraph compatibility for Hunyuan Video#11457
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR improves fullgraph compatibility for Hunyuan Video by replacing a loop‑based attention mask construction with a vectorized approach.
- Replaces manual per‐batch loop with a vectorized masked_fill operation.
- Updates the attention mask initialization from zeros to ones and adds appropriate unsqueezing for broadcasting.
Comments suppressed due to low confidence (1)
src/diffusers/models/transformers/transformer_hunyuan_video.py:1071
- [nitpick] The refactored attention mask construction is more efficient; consider adding an inline comment that explains the logic behind initializing with ones and using masked_fill for clarity.
attention_mask = torch.ones(batch_size, sequence_length, device=hidden_states.device, dtype=torch.bool)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
sayakpaul
left a comment
There was a problem hiding this comment.
Looks solid, thanks for working on this!
To confirm, I first checked out the PR branch of #11431 and merged your PR branch there and then ran RUN_SLOW=1 RUN_COMPILE=1 pytest tests/models/transformers/test_models_transformer_hunyuan_video.py -k "test_torch_compile_recompilation_and_graph_break". Everything was green.
Fixes #11431 (review)
Will run the full model for testing in some time.
testing code